{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HoltWinters for Time Series Forecasting\n",
    "\n",
    "Holt-Winters is a time-series analysis technique, used in both forecasting future entries in a time series as well as in providing exponential smoothing, where weights are assigned against historical data with exponentially decreasing impact. It does this by analyzing three components of the data: level, trend, and seasonality. \n",
    "\n",
    "In order to convert your dataset to cudf format please read the cudf documentation on https://docs.rapids.ai/api/cudf/stable. For additional information on the ExponentialSmoothing model please refer to the documentation on https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.ExponentialSmoothing\n",
    "\n",
    "This notebook will demonstrate how to forecast future datapoints using cuML's ExponentialSmoothing model by using weather data collected by NOAA at Raleigh-Durham International Airport between 2009 and today. It will predict weather patterns for the next 12 months, including temperature and precipitation. This notebook will also look at the accuracy of the algorithm as compared to statsmodels by using a train/test split on the average windspeed of the dataset, and finally run through some performance benchmarks on a mixture of different time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import needed packages\n",
    "\n",
    "from cuml import ExponentialSmoothing as cuES\n",
    "import cudf\n",
    "import numpy as np\n",
    "import urllib.request\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "from statsmodels.tsa.holtwinters import ExponentialSmoothing as smES\n",
    "from sklearn.metrics import r2_score\n",
    "from time import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ignore warning messages\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import NOAA RDU Weather Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_path = \"../cuml/data/weather/noaa_rdu.csv\"\n",
    "pdf = pd.read_csv(file_path, sep=';')\n",
    "pdf.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parse data and groupby month to accumulate average"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def date_helper(month):\n",
    "    if len(str(month)) == 1:\n",
    "        return '0'+str(month)\n",
    "    else:\n",
    "        return str(month)\n",
    "\n",
    "pdf['date'] = pd.to_datetime(pdf['date']).apply(lambda x : str(x.year) + '-' + date_helper(x.month))\n",
    "\n",
    "pdf = pdf.groupby('date').mean()\n",
    "pdf['month'] = pdf.index.map(lambda x : x[-2:])\n",
    "pdf.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examine Monthly Max/Min Temperature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(20,10))\n",
    "l1, = plt.plot(pdf['temperaturemax'], color = 'red')\n",
    "l2, = plt.plot(pdf['temperaturemin'], color = 'blue')\n",
    "plt.ylabel('Temperature')\n",
    "plot1 = plt.gca()\n",
    "plt.legend((l1, l2), ('Average Max Temp', 'Average Min Temp'), loc = 'best')\n",
    "plot1.axes.get_xaxis().set_ticks([])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Max/Min Temperature Over Past 2 Years"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(20,10))\n",
    "l1, = plt.plot(pdf.index[-24:], pdf['temperaturemax'].tail(24), color = 'red')\n",
    "l2, = plt.plot(pdf['temperaturemin'].tail(24), color = 'blue')\n",
    "plt.ylabel('Temperature')\n",
    "plt.xlabel('Date')\n",
    "plt.legend((l1, l2), ('Avg Daily Max', 'Avg Daily Min'))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Forecast Next 12 Months Using HoltWinters\n",
    "\n",
    "Now we'll use cuML's HoltWinters algorithm to predict the average max/min temperature as well as the precipitation for the next year. First we create a cuDF.Dataframe of the data we want to predict on:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = cudf.DataFrame.from_pandas(pdf[['temperaturemax','temperaturemin','precipitation']])\n",
    "data.to_pandas().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Begin by creating an `ExponentialSmoothing` object. Valid parameters include:\n",
    "\n",
    "- `endog`: the endogenous dataset in question\n",
    "- `seasonal` (default='additive'): whether the seasonality is \"additive\" or \"multiplicative\".\n",
    "- `seasonal_periods` (default=2): seasonality of the data. As our data is monthly, we set this to 12.\n",
    "- `start_periods` (default=2): number of seasons to be used for seasonal seed values.\n",
    "- `ts_num` (default=1): number of different time series in `endog`.\n",
    "- `eps` (default=2.24e-3): accuracy to which gradient descent should achieve.\n",
    "- `handle` (default=None): cuml.Handle. A new one is created if None is passed.\n",
    "\n",
    "The data in `endog` should be array-like, but can be any of type `cudf.dataframe`, `cupy.ndarray`, `numpy.ndarray`, or `cuda_array`. It is important to note that `cudf.dataframe` is column major, while the other types are row major. We then can fit the object to the data by using the `.fit()` method.\n",
    "\n",
    "Finally, use the `.forecast()` method to return a cudf.Series or cudf.DataFrame of your forecasted points. Predict takes `h`, the number of points you would like to forecast, and `index`, the index of the time series from which you want forecasted points. If `index = None`, then the forecast is done for all points and a cudf.DataFrame of the results is returned.\n",
    "\n",
    "- `h` (default=1): number of forecasted points to return.\n",
    "- `index` (default=None): index of the time series from data from which you want the forecasted points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "months_to_predict = 12\n",
    "\n",
    "cu_hw = cuES(data, seasonal_periods=12, ts_num=3)\n",
    "cu_hw.fit()\n",
    "cu_preds = cu_hw.forecast(months_to_predict)\n",
    "\n",
    "tempmax_pred = cu_preds[0]\n",
    "tempmin_pred = cu_preds[1]\n",
    "precip_pred = cu_preds[2]\n",
    "\n",
    "# Equivalent to:\n",
    "# tempmax_pred = cu_hw.forecast(months_to_predict, 0)\n",
    "# tempmin_pred = cu_hw.forecast(months_to_predict, 1)\n",
    "# precip_pred = cu_hw.forecast(months_to_predict, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Forecasted Points Alongside Original Data\n",
    "\n",
    "First, we compare how the forecasted points look when put alongside the original temperature data (just the last year, and then all data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "next_six_months = ['2019-08', '2019-09', '2019-10', '2019-11',\n",
    "                   '2019-12', '2020-01', '2020-02', '2020-03',\n",
    "                   '2020-04', '2020-05', '2020-06', '2020-07',\n",
    "                   '2020-08']\n",
    "tempmax_pred = np.insert(tempmax_pred.to_array(), 0, pdf['temperaturemax'].iloc[-1])\n",
    "tempmin_pred = np.insert(tempmin_pred.to_array(), 0, pdf['temperaturemin'].iloc[-1])\n",
    "precip_pred = np.insert(precip_pred.to_array(), 0, pdf['precipitation'].iloc[-1])\n",
    "\n",
    "plt.figure(figsize=(20,10))\n",
    "l1, = plt.plot(pdf.index[-12:], pdf['temperaturemax'].tail(12), 'r-')\n",
    "l2, = plt.plot(pdf.index[-12:], pdf['temperaturemin'].tail(12), 'b-')\n",
    "p1, = plt.plot(next_six_months, tempmax_pred, 'r:')\n",
    "p2, = plt.plot(next_six_months, tempmin_pred, 'b:')\n",
    "plt.ylabel('Temperature')\n",
    "plt.xlabel('Date')\n",
    "plt.legend((l1, l2, p1, p2), ('Avg Max Temp', 'Avg Min Temp', 'Predicted Max Temp', 'Predicted Min Temp'))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(20,10))\n",
    "\n",
    "n = len(pdf.index)\n",
    "\n",
    "l1, = plt.plot(np.arange(n), pdf['temperaturemax'], 'r-')\n",
    "l2, = plt.plot(np.arange(n), pdf['temperaturemin'], 'b-')\n",
    "p1, = plt.plot(np.arange(months_to_predict+1)+n-1, tempmax_pred, 'r:')\n",
    "p2, = plt.plot(np.arange(months_to_predict+1)+n-1, tempmin_pred, 'b:')\n",
    "plt.ylabel('Temperature')\n",
    "plot1 = plt.gca()\n",
    "plt.legend((l1, l2, p1, p2), ('Avg Max Temp', 'Avg Min Temp', 'Predicted Max Temp', 'Predicted Min Temp'))\n",
    "plot1.axes.get_xaxis().set_ticks([])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Examine Precipitation Forecast\n",
    "\n",
    "Next we examine the forecast for a less \"nice\" dataset: the precipitation levels over the past years."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(20,10))\n",
    "l1, = plt.plot(np.arange(n), pdf['precipitation'], 'b-')\n",
    "p1, = plt.plot(np.arange(months_to_predict+1)+n-1, precip_pred, 'b:')\n",
    "plt.ylabel('Precipitation')\n",
    "plot1 = plt.gca()\n",
    "plt.legend((l1, p1), ('Avg', 'Predicted'))\n",
    "plot1.axes.get_xaxis().set_ticks([])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compare Accuracy to StatsModels on Windspeed Dataset\n",
    "\n",
    "The last thing we'll do with our NOAA dataset is to compare the accuracy of cuML's ExponentialSmoothing with statsmodels' ExponentialSmoothing. To do this, we'll take the windspeed data, do a train/test split, and take the r2 scores of both statsmodels' forecasted points and cuML's forecasted points as compared to the test points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get Windspeed data from dataframe\n",
    "wind_data = pdf['avgwindspeed'].values\n",
    "\n",
    "# Test/Train Split\n",
    "spl = 0.2\n",
    "wind_data = np.asarray(wind_data, dtype=np.float64)\n",
    "h = int(wind_data.shape[0]*spl)\n",
    "train = wind_data[:-h]\n",
    "test = wind_data[-h:]\n",
    "\n",
    "\n",
    "# cuML HoltWinters\n",
    "cu_hw = cuES(train, seasonal='additive', seasonal_periods=12)\n",
    "cu_hw.fit()\n",
    "cu_pred = cu_hw.forecast(h)\n",
    "\n",
    "# statsmodels ExponentialSmoothing\n",
    "sm_hw = smES(train, seasonal='additive', seasonal_periods=12)\n",
    "sm_hw = sm_hw.fit()\n",
    "sm_pred = sm_hw.forecast(h)\n",
    "\n",
    "cu_r2 = r2_score(test, cu_pred)\n",
    "sm_r2 = r2_score(test, sm_pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(20,10))\n",
    "\n",
    "n = int(0.4*train.shape[0])\n",
    "\n",
    "test = np.insert(test, 0, train[-1])\n",
    "cu_pred = np.insert(cu_pred.to_array(), 0, train[-1])\n",
    "sm_pred = np.insert(sm_pred, 0, train[-1])\n",
    "\n",
    "tr_d, = plt.plot(np.arange(n), train[-n:], 'r-')\n",
    "te_d, = plt.plot(np.arange(h+1)+n-1, test, 'r:')\n",
    "cu, = plt.plot(np.arange(h+1)+n-1, cu_pred, 'b:')\n",
    "sm, = plt.plot(np.arange(h+1)+n-1, sm_pred, 'g:')\n",
    "plt.ylabel('Wind Speed')\n",
    "plot1 = plt.gca()\n",
    "plt.legend((tr_d, te_d, cu, sm), ('Training Data', 'Test Data', 'cuML', 'statsmodel'))\n",
    "plot1.axes.get_xaxis().set_ticks([])\n",
    "plt.show()\n",
    "\n",
    "print('cuML accuracy score: ' + str(cu_r2))\n",
    "print('statsmodels accuracy score: ' + str(sm_r2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Comparing Speed to Statsmodels\n",
    "\n",
    "Finally, we'll see how statsmodels and cuML compare on their speed of execution.\n",
    "\n",
    "## Time Series Generator\n",
    "To do this, we'll need a way of creating a large number of datasets. Here, we define functions to artificially create trends (which implicitly defines a level), seasons (via sine waves), and Gaussian noise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def trend(x, m=1, b=0):\n",
    "    return m*x + b\n",
    "\n",
    "def sine_season(x, fs=100, f=2, amp=1):\n",
    "    return amp * np.sin(2*np.pi*f * (x/fs))\n",
    "\n",
    "def normal_noise(scale=1, size=1):\n",
    "    return np.random.normal(scale=scale, size=size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By combining them, we can create artificial time series on which to test our execution speed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_timeseries_components(fs=100, f=4, m=1, b=0, amp=1, scale=1):\n",
    "    x = np.arange(fs)\n",
    "    t = trend(x, m, b)\n",
    "    s = sine_season(x, fs, f, amp)\n",
    "    n = normal_noise(scale, fs)\n",
    "    return (t, s, n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t, s, n = get_timeseries_components(100, 4, 1, 0, 20, 2.5)\n",
    "a = t + s + n\n",
    "split = 75\n",
    "train_a = a[:split]\n",
    "test_a = a[split:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(train_a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## cuES and smES on Our Time Series Generator\n",
    "We now can examine how long it takes to fit our cuML model (cuES) as well as our statsmodels model (smES) on the above time series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cu = cuES(train_a, seasonal_periods=25, start_periods=2, ts_num=1, eps=2.24e-7, seasonal='add')\n",
    "cu = cu.fit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "sm = smES(train_a, seasonal_periods=25, seasonal='add')\n",
    "sm = sm.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also see how well these models forecast future points in our time series, both visually as well as by again looking at the r2 score as compared to the test split."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cu_f = cu.forecast(25)\n",
    "sm_f = sm.forecast(25)\n",
    "plt.plot(np.arange(0, split), train_a, color='b')\n",
    "plt.plot(np.arange(split, 100), cu_f.to_pandas(), color='r')\n",
    "plt.plot(np.arange(split, 100), test_a, color='b', linestyle='dashed')\n",
    "plt.plot(np.arange(split, 100), sm_f, color='g')\n",
    "plt.legend(['train', 'cuml', 'test', 'statsmodels'])\n",
    "plt.title(\"cuml vs statsmodels predictions on sine waves\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r2_score(test_a, cu_f), r2_score(test_a, sm_f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## cuES Over Multiple Generated TS\n",
    "\n",
    "We now demonstrate that the same process could be used to generate multiple unique time series with different frequencies, amplitude, and noise. By putting the time series into a single DataFrame, we can use cuES to fit over all series at the same time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ta, sa, na = get_timeseries_components(m=1, amp=20, scale=2.5)\n",
    "tb, sb, nb = get_timeseries_components(m=0.5, amp=20, scale=5)\n",
    "tc, sc, nc = get_timeseries_components(m=0, amp=20, scale=3.3)\n",
    "a = ta + sa + na\n",
    "b = tb + sb + nb\n",
    "c = tc + sc + nc\n",
    "split = 75\n",
    "train_a, train_b, train_c = a[:split], b[:split], c[:split]\n",
    "test_a, test_b, test_c = a[split:], b[split:], c[split:]\n",
    "train_df = cudf.DataFrame({\"a\":train_a, \"b\":train_b, \"c\":train_c})\n",
    "test_df = cudf.DataFrame({\"a\":test_a, \"b\":test_b, \"c\":test_c})\n",
    "plt.plot(train_df.to_pandas())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cu = cuES(train_df, seasonal_periods=25, start_periods=3, ts_num=3, eps=2.24e-7, seasonal='additive')\n",
    "cu = cu.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparison Numbers\n",
    "\n",
    "Now for the final comparisons: we'll use the above generator to create a large number of series, stick them into a DataFrame, and compare the time it takes to run cuES over the DataFrame, or equivalently, smES over each column. We begin by creating the DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize parameters:\n",
    "np.random.seed(100)\n",
    "single_series_len = 1250\n",
    "num_of_series = 1000\n",
    "tt_split = int(0.8*num_of_series)\n",
    "train_pdf = pd.DataFrame()\n",
    "test_pdf = pd.DataFrame()\n",
    "\n",
    "#Create Dataframe of series:\n",
    "for i in range(num_of_series):\n",
    "    t, s, n = get_timeseries_components(fs=single_series_len,\n",
    "                                        f = 4,\n",
    "                                        m = np.random.uniform(-2, 2),\n",
    "                                        b = np.random.uniform(-3, 3),\n",
    "                                        amp = np.random.uniform(-1, 1),\n",
    "                                        scale = np.random.uniform(1, 3))\n",
    "    time_series = t + s + n\n",
    "    train_pdf[i] = time_series[:tt_split]\n",
    "    test_pdf[i] = time_series[tt_split:]\n",
    "    \n",
    "train_pdf[train_pdf.columns[:5]].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we time how long it takes for us to fit both cuES and smES over all the columns of our DataFrame. Note that on several occasions, statsmodels fails to converge, which could possibly accelerate its computation time (exiting earlier)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Raw time comparisons\n",
    "\n",
    "train_gdf = cudf.from_pandas(train_pdf)\n",
    "cu_st = time()\n",
    "cu = cuES(train_gdf, seasonal_periods=int(single_series_len/4), seasonal='add', ts_num=num_of_series)\n",
    "cu.fit()\n",
    "cu_et = time()\n",
    "\n",
    "sm_st = time()\n",
    "for column in train_pdf:\n",
    "    sm = smES(train_pdf[column], seasonal_periods=int(single_series_len/4), seasonal='add')\n",
    "    sm = sm.fit()\n",
    "sm_et = time()\n",
    "\n",
    "print(\"For \" + str(num_of_series) + \" columns of length \" + str(single_series_len)\n",
    "      + \" it takes cuml \" + str(cu_et - cu_st) + \" seconds to fit.\")\n",
    "print()\n",
    "print(\"For \" + str(num_of_series) + \" columns of length \" + str(single_series_len)\n",
    "      + \" it takes statsmodels \" + str(sm_et - sm_st) + \" seconds to fit.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we wanted to, we could also time predictions and compare their r2 score to the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_of_series = 100\n",
    "\n",
    "#take only 100 samples\n",
    "train_pdf = train_pdf[train_pdf.columns[:num_of_series]]\n",
    "test_arr = test_pdf[test_pdf.columns[:num_of_series]].values.transpose()\n",
    "num_of_preds = test_arr.shape[1]\n",
    "\n",
    "#timing for cuES to forecast points from each time series\n",
    "train_gdf = cudf.from_pandas(train_pdf)\n",
    "cu_st = time()\n",
    "cu = cuES(train_gdf, seasonal_periods=int(single_series_len/4), seasonal='add', ts_num=num_of_series)\n",
    "cu.fit()\n",
    "cu_preds = cu.forecast(num_of_preds)\n",
    "cu_et = time()\n",
    "\n",
    "#timing for smES to forecast points from each time series\n",
    "sm_preds = np.zeros((num_of_series, num_of_preds))\n",
    "sm_st = time()\n",
    "for i in range(len(train_pdf.columns)):\n",
    "    sm = smES(train_pdf[train_pdf.columns[i]], seasonal_periods=int(single_series_len/4), seasonal='add')\n",
    "    sm = sm.fit()\n",
    "    sm_preds[i] = sm.forecast(num_of_preds)\n",
    "sm_et = time()\n",
    "\n",
    "cu_preds = cu_preds.as_matrix().transpose()\n",
    "cu_r2_scores = r2_score(test_arr, cu_preds)\n",
    "sm_r2_scores = r2_score(test_arr, sm_preds)\n",
    "\n",
    "print(\"Forecasting \" + str(num_of_preds) + \" points for \" + str(num_of_series) + \" different time series:\")\n",
    "print(\"cuES: \" + str(cu_et - cu_st) + \" s\")\n",
    "print(\"smES: \" + str(sm_et - sm_st) + \" s\")\n",
    "print()\n",
    "print(\"Average cuES r2 score: \" + str(np.mean(cu_r2_scores)))\n",
    "print(\"Average smES r2 score: \" + str(np.mean(sm_r2_scores)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Additional Notes and Warnings:\n",
    "\n",
    "**Known Limitations**:\n",
    "    This version of ExponentialSmoothing currently provides only a limited\n",
    "    number of features when compared to the\n",
    "    statsmodels.holtwinters.ExponentialSmoothing model. Noticeably, it lacks:\n",
    "- `.predict()` : no support for in-sample prediction.\n",
    "                 (https://github.com/rapidsai/cuml/issues/875)\n",
    "- `.hessian()` : no support for returning Hessian matrix.\n",
    "                 (https://github.com/rapidsai/cuml/issues/880)\n",
    "- `.information()` : no support for returning Fisher matrix.\n",
    "                     (https://github.com/rapidsai/cuml/issues/880)\n",
    "- `.loglike()` : no support for returning Log-likelihood.\n",
    "                 (https://github.com/rapidsai/cuml/issues/880)\n",
    "                         \n",
    "Additionally, be warned that there may exist floating point instability\n",
    "issues in this model. Small values in endog may lead to faulty results.\n",
    "See https://github.com/rapidsai/cuml/issues/888 for more information.\n",
    "    \n",
    "**Known Differences**:\n",
    "This version of ExponentialSmoothing differs from statsmodels in some\n",
    "other minor ways:\n",
    "- `.__init__()` : Cannot pass trend component or damped trend component\n",
    "- `.__init__()` : this version can take additional parameters (eps, start_periods, ts_num, handle)\n",
    "- `.score()` : returns SSE rather than gradient logL\n",
    "               (https://github.com/rapidsai/cuml/issues/876)\n",
    "- this version provides `get_level()`, `get_trend()`, `get_season()`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
